Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS: Support a taint tracking for arguments of .apply() function call #6559

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

yuske
Copy link

@yuske yuske commented Aug 26, 2021

Implemented DataFlow::Node's for arguments of .apply function calls and propagation of tainted values from the argsArray parameter to these arguments. Fixed small issues in the array initialization and improved unit tests, see arrays-init.js and call-apply.js.

@erik-krogh, take a look at this PR. We discussed this one in Slack.

@yuske yuske requested a review from a team as a code owner August 26, 2021 14:17
@github-actions github-actions bot added the JS label Aug 26, 2021
@erik-krogh erik-krogh self-assigned this Aug 26, 2021
@erik-krogh
Copy link
Contributor

Impressive work.

For now you need to run the autoformatter on a bunch of files, that is what is causing the CI to fail.
There files:

ql/javascript/ql/lib/semmle/javascript/dataflow/internal/FlowSteps.qll
ql/javascript/ql/lib/semmle/javascript/dataflow/internal/DataFlowNode.qll
ql/javascript/ql/lib/semmle/javascript/dataflow/DataFlow.qll
ql/javascript/ql/test/library-tests/Arrays/DataFlow.ql

And I definitely need to run some performance tests on this, and I unfortunately suspect that these tests will show significant performance regressions.

I'm especially looking at the removal of i < 5 and.
Do you really need that part, because I suspect it to cause a significant slowdown for very little benefit.


Unfortunately I don't think I'll have time to take a close look at your PR this week, but I'll definitely spend some time on it Monday.

@yuske
Copy link
Author

yuske commented Aug 26, 2021

Thanks, @erik-krogh!

I'm especially looking at the removal of i < 5 and.

I did this fix when added the tests in arrays-init.js. Otherwise, we get false-negative cases for sink(arr[5]) in the line 61. It is not necessary for supporting the .apply() function, at least if we have less than 5 arguments. Can you do perf tests with and without this conjunct? If we get noticeable perf degradation, I will revert it and fix these false negatives in another way later. I think the pseudo-property for an array element requires re-writing anyway because we have false positives in other test cases.

@erik-krogh
Copy link
Contributor

Performance doesn't look good, and it's caused by TApplyArgumentNode.

Lets look at the definition of that type.

TApplyArgumentNode(MethodCallExpr ce, Function func, int i) { exists(func.getParameter(i)) }

The way such a newtype works is that a database row is created for each combination of values there exists for the "parameters" of the type.
So in this case CodeQL will create a few rows for each pair of Function and MethodCallExpr, which can quickly blow up as it's basically quadratic in the size of the program.
For example, for the medium-sized project gatsby, CodeQL has to create hundreds of millions of rows, which caused an out-of-memory error on my machine.

You can try it out yourself by running any data-flow query on the gatsby project. (E.g. Xss.ql).

So we need to do it some other way.
I would look into DataFlow::FunctionReturnNode. That node "collects" all the possible returns from a function into a single dataflow node.
I think we can do something similar for parameters (specifically for .apply() calls).

An (untested) idea I have is to create a new dataflow node associated with each function. This new node would hold the "array" from an .apply() call.
E.g. a TReflectiveParametersNode(Function f).
We then add a step from the .apply() array to the ReflectiveParametersNode, and add loadStep()s loading each element from the ReflectiveParametersNode into the parameters for that function.
If we can get that to work, then we will have a very limited number of dataflow nodes, and I think we can still get the same precision.

Something entirely different might also work, it was just a quick idea I had.

Let me know if you need help with that, or if you want me to give it a go.


The javascript/ql/test/library-tests/DataFlow/tests.ql is failing, which seems to be caused by the DataFlow::ApplyArgumentNode class not having any result for getBasicBlock.

@yuske
Copy link
Author

yuske commented Aug 30, 2021

Thank you for the update, @erik-krogh!

I understand now how newtype works, probably if I add restrictions for ce and relation between MethodCallExpr and Function that already should improve the situation. I will try to implement the ideas that you suggest and test them on the gatsby project during this week but unfortunately not today. Keep in touch!

@erik-krogh
Copy link
Contributor

probably if I add restrictions for ce and relation between MethodCallExpr and Function that already should improve the situation.

If you go that route I think you'll end up with recursion between the callgraph and the definition of DataFlow nodes, which will likely result in non-monotonic recursion.

I'm looking forward to see what you come up with.

@yuske
Copy link
Author

yuske commented Sep 7, 2021

Hej @erik-krogh! I'm back to work on the PR. I did a fix to avoid a combinatorial explosion in TApplyArgumentNode. You are right that non-monotonic recursion does not allow us to relate MethodCallExpr with Function in TApplyArgumentNode. I used the Function there only to determine the number of parameters for each function exactly. So I just upper bound it by the maximum parameter number and remove the Function from TApplyArgumentNode. Now we have some overhead due to the maximum parameter number but create relations only for the .apply function calls. I think it is a good trade-off.

I guess that your suggestion with TReflectiveParametersNode(Function f) should work as well. However, we need to create additional relations for each function and it could require more computations. I will try it if the perf tests are not satisfactory.

I tested the last fix on the gatsby project and Xss.ql as you suggested, and I see the same time for main and my branches (73-75 sec on my laptop). I am not sure about an usage of a cached predicate for the expression max(Function f | | f.getNumParameter()). I want to make it execute once for TApplyArgumentNode, but maybe the query optimizer does it without using cached. Btw, it would be great to look at a query plan like in SQL engines to estimate how many times CodeQL executes this or that predicate. Do you have any tools for such performance analysis? I noticed that VSCode CodeQL plugin has "Running Queries: Debug" that "is useful for debugging performance". Do you have any docs on how to use these logs?

@erik-krogh
Copy link
Contributor

Btw, it would be great to look at a query plan like in SQL engines to estimate how many times CodeQL executes this or that predicate. Do you have any tools for such performance analysis? I noticed that VSCode CodeQL plugin has "Running Queries: Debug" that "is useful for debugging performance". Do you have any docs on how to use these logs?

I don't think we have much documentation. There is some here, but I don't think it's that useful here.

What I tend to do is have the option Running Queries: Debug enabled (as you mentioned).
If you additionally set a custom directory with the Custom Log Directory option, then massive log files will appear in that directory.
These log files include everything that was executed, and quite detailed timing information about how long each predicate took (and how many times recursive predicates were executed).
Reading the relational algebra takes some getting used to, but if you are used to query plans from SQL engines, then it should be somewhat intuitive.


I'll kick of a performance evaluation, and then take a closer look at your code when that comes back.

@erik-krogh
Copy link
Contributor

erik-krogh commented Sep 9, 2021

A performance evaluation came back, and it looks somewhat OK.
I actually didn't expect that, given that we've previously seen bad performance from those exact changes.
(The CodeQL CLI team has done lots of optimization work since, I suppose it worked).

Of the projects I benchmarked on, the bwip-js project had the biggest performance regression (about 30% when running the security suite).
Two notes: 1) I suspect it's your i < 5 and change that is to blame (with no proof). 2) bwip-js is one of the usual suspects when it comes to weird performance.
I can take a look at exactly why bwip-js performs worse if you want to.
Otherwise, it's an opportunity to dive further into CodeQL performance.

Your implementation looks good, given the algorithm you've decided to implement.
A few comments for now:

  • I would like to see a max for i in TApplyArgumentNode (to avoid a worst case O(n^2) blowup).
  • There should be no need to cache the getMaxNumFunctionParameter predicate.
    cache means cache to disk, it should compute the predicate just once during any query run.
    (unless it's inlined, you can use pragma[noinline] to prevent that, but there should be no need).
  • You've flipped how we usually use the // OK and // NOT OK comments.
    (We usually use // NOT OK to indicate that there is flow from a source to a sink).

I've become more interested in exploring my TReflectiveParametersNode(Function f) idea, so I think I'll give that a go myself.
I'll report back later how it went.
Update: I got it to work, I'll post a link sometime tomorrow.

@erik-krogh
Copy link
Contributor

I've become more interested in exploring my TReflectiveParametersNode(Function f) idea, so I think I'll give that a go myself.
I'll report back later how it went.

Here is my experiment: 9d96260

With a ReflectiveParametersNode I was also able to support the arguments object in a nice way. (See the test case in the bottom of call-apply.js).
I don't think we can support such a scenario with your approach, and I've encountered real vulnerabilities where such flow is relevant (CVE-2020-7755).

I'll get a performance evaluation going on my own experiment.

@yuske
Copy link
Author

yuske commented Sep 10, 2021

Thanks for sharing your approach. I will try to reproduce my tests on your code. At the first look, TReflectiveParametersNode(Function f) looks like the first approach when we tried to pass arguments directly to the function parameters. I thought that you want to create separated nodes for each argument of the reflective function (i.e. for any function because it could be called by .apply()), like TReflectiveParametersNode(Function f, int i). Maybe I missed something, I will test it carefully and come back.

A good point about arguments! I am pretty sure that it is possible to add it to my approach also, but let's decide the final one firstly.

I've encountered real vulnerabilities where such flow is relevant (CVE-2020-7755).

Great! I started to work on the .apply function supporting because several real vulnerabilities from my data set have not been detected and required taint propagation via .apply :)

I can take a look at exactly why bwip-js performs worse if you want to.

I can investigate this issue later when we choose an approach. I'd like to ask how you run your performance benchmarks. Because I just run queries from VSCode and compare the time that is reported in the UI. Probably it is not the best way, I see about 10% difference from run to run on the same data. Probably you run it from CLI with cache invalidation, maybe with some other important parameters?

@erik-krogh
Copy link
Contributor

erik-krogh commented Sep 10, 2021

I can investigate this issue later when we choose an approach. I'd like to ask how you run your performance benchmarks. Because I just run queries from VSCode and compare the time that is reported in the UI. Probably it is not the best way, I see about 10% difference from run to run on the same data. Probably you run it from CLI with cache invalidation, maybe with some other important parameters?

We got some internal tools for measuring performance across many projects and queries.
But when digging down into a specific issue, I still use VSCode (with debug enabled in the CodeQL settings, as previously mentioned).

I usually use VSCode running once with a baseline and once with the PR i want to test.
When measuring performance I usually do CodeQL: Restart Query Server and CodeQL: Clear Cache in VSCode before and between runs. (Sometimes I restart VSCode if RAM usage is high).

I sometimes fix the CPU frequency, otherwise thermals will have a big impact on performance (if you're using a laptop).
On Ubuntu I use sudo cpupower frequency-set -u 3.5Ghz.

When I then have two output logs, i usually open them both in VSCode side by side and start by looking at the Clause timing report: from the output logs (starting from the bottom of the output log).
Most of the time performance regressions show up as pretty clear outliers in the Clause timing report.

@yuske
Copy link
Author

yuske commented Sep 10, 2021

That's very helpful. Thanks!
I continue looking at tests, come back soon.

@yuske
Copy link
Author

yuske commented Sep 10, 2021

Took a look at your code and tests in detail.
You are right about the case sink(foo1_apply([source, ""])). It works on my code just for one level because I use separate nodes for each argument instead of properties of a data node TReflectiveParametersNode(Function f). The following test does not work:

function foo1_apply(arr) {
  return foo1_apply2(this, arr);
}

function foo1_apply2(arr) {
  return foo1.apply(this, arr);
}

Looking forward to the perf tests results. I got some numbers from the gatsby project, it has 16 .apply method calls, 18 max number of parameters, and 22793 functions. So, we should get fewer entities of TApplyArgumentNode(MethodCallExpr ce, int i) than TReflectiveParametersNode(Function f). Let's see which one affects the perf more.

Seems we forgot about this argument in .apply() call. This code sink(Array.prototype.slice.apply(source)) doesn't work. I can add it later.

I also noticed weird taint propagation for .call. The line var sink7 = foo1_call(["", source]); has not been detected for TestDataFlowConfiguration (this configuration extends DataFlow::Configuration) and detected for TestTaintTrackingConfiguration (this is TaintTracking::Configuration). I did not think this difference should affect .call behavior and arrays.

@erik-krogh
Copy link
Contributor

erik-krogh commented Sep 10, 2021

Looking forward to the perf tests results.

I got some numbers from my experiment.
It a little better than your approach, the performance penalty is a bit smaller.
(I reverted your i < 5 and change in my experiment, that might be some of it).

The worst-case is still bwip-js, but now only with a ~14% performance decrease.
I haven't look into why there is a performance decrease, or if my approach can be improved somehow.

Good point about this.
That should be relatively easy to add (some) support for in argumentPassing.
You can look into PartialInvokeNode, that might allow you to get more complete support.

@esbena
Copy link
Contributor

esbena commented May 9, 2022

Draft-stating the PR due to inactivity, but feel free to undraft when you have implemented the planned changes.

@esbena esbena marked this pull request as draft May 9, 2022 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants